Evaluation of Fast K-nearest Neighbors Search Methods Using Real Data Sets

نویسنده

  • Yi-Ching Liaw
چکیده

The problem of k-nearest neighbors (kNN) search is to find nearest k neighbors from a given data set for a query point. To speed up the finding process of nearest k neighbors, many fast kNN search algorithms were proposed. The performance of fast kNN search algorithms is highly influenced by the number of dimensions, number of data points, and data distribution of a data set. In the extreme case, the performance of a fast kNN search algorithm may be poorer than the full search algorithm. To help understand the performance of fast kNN search algorithms on data sets of different types, five fast algorithms were tested in this paper using multiple real data sets. The experimental results of the paper will be very useful in choosing a fast kNN search algorithm for an unknown data set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

Adaptive k-Nearest-Neighbor Classification Using a Dynamic Number of Nearest Neighbors

Classification based on k-nearest neighbors (kNN classification) is one of the most widely used classification methods. The number k of nearest neighbors used for achieving a high accuracy in classification is given in advance and is highly dependent on the data set used. If the size of data set is large, the sequential or binary search of NNs is inapplicable due to the increased computational ...

متن کامل

Fast k-NN classification rule using metric on space-filling curves

A fast nearest neighbor algorithm for pattern classiication is proposed and tested on real data. The patterns (points in d-dimensional Euclidean space) are sorted along a space-lling curve. This way the multidi-mensional problem is compressed to the simplest case of the nearest neighbor search in one dimension.

متن کامل

Comparison and evaluation of the performance of data-driven models for estimating suspended sediment downstream of Doroodzan Dam

Dams control most of the sediment entering the reservoir by creating static environments. However, sediment leaving the dam depends on various factors such as dam management method, inlet sediment, water height in the reservoir, the shape of the reservoir, and discharge flow. In this research, the amount of suspended sediment of Doroodzan Dam based on a statistical period of 25 years has been i...

متن کامل

Diagnosis of Heart Disease Using Binary Grasshopper Optimization Algorithm and K-Nearest Neighbors

Introduction: The heart is one of the main organs of the human body, and its unhealthiness is an important factor in human mortality. Heart disease may be asymptomatic, but medical tests can predict and diagnose it. Diagnosis of heart disease requires extensive experience of specialist physicians. The aim of this study is to help physicians diagnose heart disease based on hybrid Binary Grasshop...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011